Large sample scaling limit to a diffusion process 
of Markov chain Monte Carlo methods 
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Abstract 

We study poor behavior of Markov chain Monte Carlo methods in 
large sample framework. We define weak consistency to measure the con- 
vergence rate of Monte Carlo procedure. This property is studied by con- 
vergence of step Markov process to a diffusion process. We apply weak 
consistency to a popular data augmentation for simple mixture model. 
The Monte Carlo method is known to work poorly when one of a mix- 
ture proportion is close to 0. We show that it is not (local) consistent 
but (local) weak consistent. As an alternative, we propose a Metropolis- 
Hastings algorithm which is local consistent for the same model. These 
results come from a weak convergence property of Monte Carlo procedures 
which is difficult to obtain from Harris recurrence approach. 
Keyword: Markov chain Monte Carlo; Asymptotic Normality; Diffusion 
process 

1 Introduction 

Markov chain Monte Carlo (MCMC) method has become an essential tool in 
any study that has a complicated posterior calculation problem. Various new 
MCMC methods have been developed in the last decades. This research focuses 
on an efficiency of those MCMC methods. One of a useful measure of efficiency 
is the ergodicity of a transition kernel of a MCMC method. There arc many 
studies related to sufficient conditions for ergodicity: see reviews [14] and [13] 
and textbooks such as [10] and [9]. See [11] and [12] for other measure of 
efficiency. In a paper [6] took a different approach to study an efficiency of 
MCMC methods. 
Consider a model 

{p{dx\e);ee 6} 

with prior distribution p on Q. Let x„ = (x^, x'^, . . . , x" ) be an observation from 
the model and write p{d9\xn) for its posterior distribution. MCMC generates 
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a sequence of Markov chain 0(0), 0(1), . . . and its empirical mean tends to a 
posterior mean. The convergence behavior of the Markov chain is important 
for a practical point of view. Usually, Foster-Lyapounov type drift condition 
is established for fixed a;„ to show geometric ergodicity of the Markov chain. 
However the construction of an efficient drift condition is technically difficult 
since the transition kernel has a complicated structure defined by p{dd\xn) and 
related probability measures. On the other hand, the behavior of p{d6\xn) for 
n — > oo is well known and it seems possible to construct a general framework 
to study the behavior of MCMC. Therefore in theoretical and practical point of 
view, it seems more natural to consider asymptotics not only on 9(0), 9(1), . . . 
but also on Xn through n — > oo. For that purpose, it is natural to study the 
law M„(-; Xn) C(9(0),9(l), . . . \xn) as a random variable. Then M„ is just a 
random variable and it is easy to consider a convergence property with suitable 
metric. It was the starting point of [6]. In that paper, we defined consistency 
and local consistency as a measure of good behavior of MCMC. 

Unfortunately, sometimes the behavior of MCMC is poor though it is geo- 
metrically ergodic. This phenomena can also be studied by the framework of 
[6]. Using this framework, degeneracy and local degeneracy can be defined as 
a measure of poor behavior of MCMC [7]. In that paper, for cumulative link 
model, usual data augmentation (DA) method was shown to be local degenerate 
and for smaller number of categories, it is shown to improve the behavior by 
changing latent structure with marginal augmentation. 

Thus good and bad behavior of MCMC are studied. One question still 
unanswered: If a method A is local consistent and B is local degenerate but 
B requires shorter calculation time- Which should I choose? In this paper we 
propose a rate of convergence (other than ergodicity) which may be useful for 
the choice of methods A, B like above. We define (local) weak consistency for 
MCMC. As an example, we consider a DA for a simple mixture model. Though 
this model may not be so attractive for application, it is closely related to more 
general model: for example, DA for pN(t, 1) + (1 -p)N(0, 1) (p G [0, 1], |t| < T) 
under true model N(0, 1) behaves similar to our model which is conjectured to 
have a rate n~^. Moreover, we can observe several convergence rate through 
the model: from rt^^/^ to n^^ . Therefore it is useful to know the properties of 
local weak consistency. For other practical problem see an application paper [5] 
which has one convergence rate n~^. 

More precisely, we consider a model 

p(dx\9) = (1 - 9)Fo(dx) + 9Fi(dx) (1.1) 

where Fg is a probability measure on X with uniform prior. Assume we have an 
i.i.d. sequence Xn ~ (x^, • ■ • , x"') from above. We can construct a DA strategy 
9 ^ I)A(9) as follows: 

1. Each i ^ 1,. . . ,n, fiip a coin with head probability 9fi(x^)/((l — 9)fo(x^) + 
Ofiix')). 

2. Generate 9 from Beta(no + 1, ni -I- 1) where ni is the head counts and tiq 
is the tail counts. 



2 



Iterate it 9{i + 1) ^ DA(6'(i)) from ^(O) for certain length. The posterior 
distribution is approximated by 

4=0 

where 5x is a Dirac measure. This DA works quite poorly if the true proportion 
9 is very small. In fact, DA is local degenerate for such situation and moreover, 
it has the rate e^^n^/^: it is local e~^n^/^-weakly consistent and local e~^n^/^- 
strongly degenerate. This result comes from the fact that the trajectory of DA 
tends to a path of the stochastic process defined by 

dXt=b{Xt,z)dt + aiXt,z)dWt 

where b{h, z) ^ ai + hz — h^I, a{h, z)^ = 2h and / is the Fisher information 
matrix and z corresponds to the scaled maximum likelihood estimator (Theorem 
3.2). It is probably well recognized that the trajectory of poor behaved MCMC 
looks like a path of a diffusion process (see Figure 1). This result is the first 
validation for this observation. 

For comparison, wc will construct a local consistent MCMC in Section 3.2, 
which is a kind of Metropolis-Hastings algorithm. Numerical results show that 
when 9q is apart from and 1, both methods are comparable. However, for 
9o is close to or 1, the proposed method works much better than DA. These 
properties may seem strange for Harris recurrence approach, since both MCMC 
methods are uniformly ergodic, which means that both MCMC have good con- 
vergence properties. This indicates that in some cases, the "local" approach 
(consistency) can explain the behavior of the MCMC algorithm better than the 
"global" (Harris recurrence) approach. 

The goal of the paper is the following: 

1. to provide a set of sufficient conditions for weak consistency. 

2. to illustrate the difference between consistency and weak consistency. Ex- 
ample for the simple mixture model is provided. 

3. to propose an efficient MH algorithm which uses an approximation of the 
posterior distribution. This method may be applicable to general mixture 
problems. 

This paper is organized as follows. Section 2 we define (local) weak con- 
sistency and strong degeneracy with generalization of [6] to continuous Monte 
Carlo procedure. Theorem 2.8 is the main result in this paper. These results 
are used in Section 3 to show local weak consistency of DA of (1.1). In Section 
3.2, we introduce a new MCMC method which is local consistent for the same 
model. Numerical results is provided in Section 4 which shows the effect of the 
rate of weak consistency. 
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1.1 Notation 



R"*" = [0, oo), No = {0,1,...}. 9 is a Polish space having metric d such that 
d{a,b) < 1. For measurable space {X,X), V{X) is the space of probability 
measures on X. Probability transition kernel (Ptk) from (X, X) to (F, y) is 
a function K{dy; x) such that K{A] ■) is A:'-measurable for any A £ y and 
K{-\x) € VlY) for any x £ X. We may write "P is a Ptk from X to F" for 
short. 

We write ||j^|| = sup^g^j. \i'{A)\ for the total variation of a signed measure v. 
Denote BLi(9) for a set of functions ^ on such that \ip{a) — tpib)] < d{a, b) 
and 1- For two measure v and /i, 

sup - /!(■(/') 

lAGBLi(e) 

is called a Bounded Lipshitz metric denoted by 'w{fi, v). 

2 Local weak consistency of MCMC 
2.1 Definition of local weak consistency 

In this section, we review consistency and degeneracy and define weak consis- 
tency and strong degeneracy. Consider a measurable space (X„,A'„,P„). For 
observation Xn £ Xn, let us denote 6 ^ Mn{6,Xn) to be an iterative strategy 
such as DA. Generate 9{i + 1) M„(0(i), Xn) (i = 0, . . .) from an initial guess 
0(0). This procedure defines a conditional law of 

^oo = (0(O),0(l),...) 

given Xn and we denote it by (the same symbol) Af„ — Mn{dOoc'i ^n), which is 
a Ptk from X„ to 8°° (assuming mcasurability) . Assume that a "target distri- 
bution" Iln{d9; Xn), which we want to obtain, is approximated by an "empirical 
distribution" 

^ m— 1 

eni{dO]Ooo) := — ^0(i)idO) 

where 5g is the point mass si 6} A pair M.n = {Mn,e) is called Monte Carlo 
procedure in [6] where e = {e„i; m = 1, 2, . . .}. We expect em{-',6oo) to be close 
to n„(-;a;„) which leads to a notion of a risk function. To construct a risk, 
consider the bounded Lipshitz metric of them: 

w(em(-;6'oo),n„(-;a;„)). 

It is not a risk function since it depends on non-deterministic variables 
and Xn- We simply integrate out 9oo and Xn with respect to Mn{d9oo', Xn) and 

^Although wo only consider above e„i in this paper, wc may take a different choice of e,n, 
which is a Ptk from 6°° (or X„ X 6°°) to 0. Sec [6] for detail. 
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Pn{dxn)- Thus we obtain a risk function as follows: 

RrniMn,^^) = I I w{ )7 n,i(-; Xn))Mn{d6oo', Xn)Pn{dXn) 




Definition 2.1 (Consistency). A sequence of Monte Carlo procedure {A^„;7i = 
1,2,...} is said to be consistent to {Tin', n = 1, 2, . . .} if Rm^ (-^rn n„) — !> for 
any rUn oo. 

When no confusion can arise, we will omit to write the target, {n„;7T, = 
1,2,.. .}. By consistency of posterior distribution, Iin{d9] Xn) usually tends to 
a point mass 5g under weak assumption. Then consistency of Monte Carlo 
procedure does not provide much information. In such a case, we consider local 
consistency. We consider a scaling such as H> r„(0 — 0) or H> r„(0 — 0o) for 
some Oq and r„ oo such as r„ = n^/^ or n. We denote by 11* and e*„ for 
scaled version of n„ and e„i. Let M*^ = (M„, e*) for e* = {e*„; m = 1, 2, . . .}. 

Definition 2.2 (Local Consistency). //{A^*;n = 1,2,...} is consistent to 
{Tin', n ~ 1,2 . . .} , {A4n', n = 1, 2, . . .} is said to be local consistent to {n„; n = 



It measures the distance between empirical distribution using m iteration and 
only one iteration. When a mixing property of Monte Carlo procedure is quite 
poor, all 0(0), . . . , 6{m~l) have similar value. Therefore em{-', Ooo) and ei(-; 0oo) 
are also similar which yields the small value of R'^. 

Definition 2.3 (Degeneracy). A sequence of Monte Carlo procedure {A^„;7i = 
1,2,.. .} is said to be degenerate if R'^^{Mn) for any to g N. // {A^* ; n = 
1,2,.. .} is degenerate, we call {Ain, = 1, 2, . . .} locally degenerate. 

As discussed in Introduction, as a measure of poor behavior, degeneracy is 
sometimes too wide. Roughly speaking, among degenerate Monte Carlo proce- 
dures, there are relatively good one and bad one. Even if Monte Carlo procedure 
is degenerate, sometimes it tends to n„ in a slower rate. We call this conver- 
gence property weak consistency. Similarly, we also define strong degeneracy. 
We can distinguish degenerate Monte Carlo procedures by these rates. 

Definition 2.4 (Weak Consistency). A sequence of Monte Carlo procedure 
{Ain',n — 1,2,...} is said to be r„-weak consistent to {n„;n ~ 1,2,...} if 
-Rm„ (A4„, n„) — >■ for any nin such that TO„/r„ — > +oo. When {A^*;ri = 
1,2,.. .} is r„ -weak consistent, we call {Mn', n ^ 1,2, . . .} local rn-weak consis- 
tent. 

Definition 2.5 (Strong Degeneracy). A sequence of Monte Carlo procedure 
{Ain, n ~ 1,2, . . .} is said to be rn-strong degenerate if i?^^ (Mn) — ^ for any 
m oo such that m„/r„ 0. If {A4n',n ~ 1,2,...} is degenerate, we call 
{Ain', ri = 1, 2, . . .} locally rn-strong degenerate. 





w{em{-; 9oc),ei{-; Bryo))Mn{d0oo;Xn)Pn{dXn). 
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2.2 Continuous Monte Carlo procedures 

We prepare some continuous analogous of the previous Section. Let = 
D[0, oo) be the totahty of cadlag functions from [0, oo) to 9 and let c?oo(', •) be 
its Skorohod metric on Doo- Many results carry over from Section 3 and 4 of 
[6], which was the study on 9°°. 

Let {X, X, P) be a probability space. Let M be a Ptk from X to Doo- For 
6 — {6{t);t > 0} G Doo, let et be an "empirical measure" defined by 

where ip is a. R'^-valued measurable function. We simply write e for {et(-; 9);t > 
0,9 e Doo}. 

Definition 2.6. A pair A4 — (Af, e) is called a continuous Monte Carlo proce- 
dure. 

Note that "continuous" means that M{-;x) is defined on R+, not discrete 
time Nq. It does not mean M{-;x) having continuous path. Throughout in 
this section, H is a Ptk from X to 9. Using a bounded Lipshitz metric w, we 
measure the difference 

wieti-,0),n{-;x)) 

and define an average loss 



Rt{M,n) = 




w(et (•; 6'), n(-; x))M{d9; x)P{dx). 



As in previous section, we define consistency for a sequence of Monte Carlo 
procedure A^„ and n„ on (X„, Xn, Pn)- 

Definition 2.7. Let {Al„;n > 1} &e a sequence of continuous Monte Carlo 
procedure. We call {M.n', n>l} consistent to {n„; n > 1} if for any t„ — >■ cxd 

lim i?t„(M„,n„) = 0. 

n— f CO 

Recall some terminology related to ergodicity and stationarity (see also 
Section 10.1 of [2] and Section 17.1.1 of [9]). For {x{t);t > 0} G D^o, let 
TsX = {x{t + s);t> 0}. 

• Probability measure m on D^o is said to be (strictly) stationary if m{A) = 
m{Tf^A) for any t > 0. If M is stationary, t:{A) = m{{x;x{0) £ A}) is 
called an invariant distribution. 

• A Borcl subset A of D^o is called invariant if T^^A — A {s > 0). Let A 
be a cr-algebra generated by the invariant sets. 

• m is called ergodic if m{A) = or 1 for any A & A. 
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If m is stationary and ergodic, we have the ergodic theorem (see Theorem 10.2.1 
of [2]). We prepare some terminologies for M = (M, e). 

• If M{-;x) is stationary or ergodic in P-ahnost surely, we call M and A4 
stationary or ergodic. 

• A Ptk n is called an invariant for M if n(-; a;) is an invariant probability 
measure for AI{-;x) P-a.s. 

• We call M having no fixed time discontinuity if M{-; x) has in P-a.s. 

The following consistency theorem for sequences of stationary random vari- 
ables is easy extension of Theorem 2.1 of [6]. 

Theorem 2.8. Assume that a sequence of stationary continuous Monte Carlo 
procedure Ain {n = 1,2,...) tends to a non-random continuous Monte Carlo 
procedure moo in the following sense: 



and mac is ergodic having no fixed time of discontinuity. Then Ain is consistent 
to its invariant Il„ {n ~ 1,2,...). 

Proof. Proof is almost the same as that of Theorem 2.1 of [6]. We only show 
that for a probability measure tt on 0, 

1. For any t > 0, T > t, and stationary measure m on Doa 



2. Suppose that for t > 0, a sequence of probability measure {mn',n > 1} 
converges to m, which is continuous at t. Then F{zn) — >■ F{z) for P(m) = 
m{w{et,TT)). 

First we show 1. We split the interval [0,r] into subintcrvals with length t. 
Then the left hand side of (2.1) is 



Since m is stationary, the first term equals to {t/T)[T/t]m{w{et,Tr)) and the 
second term is bounded by (1 — {t/T)[T/t]). Since 1 — x^^ < [x]/x < 1 , by 
taking x = T/t, we obtain 1. 





w{eT{-;0),n)m{d9) < m{w{et,n)) + —. 



(2.1) 
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Second we show continuity. It is sufficient to show continuity of 6* h- >■ 
w{et{-;9*),Tr) a.t 9* = 6 € D^o if is continuous at t. Take any e > and 
strictly increasing continuous map A : [0,t] [0,t] such that 

sup \e*{u)-e{\{u))\+ sup \u~\{u)\<e + dt{e\e) 

MG[0,t] MG[0,t] 

where dt is the Skorohod metric on Dt- Observe 

w{et{-, 9*), et(-; 9)) < w{et{-, 9*),et{-; 9X)) + w{et{-, 9X),et{-; 9)) 

and the first term is bounded by sup^gjQ \9*{u) —9{X(u)) \ and the second term 
is bounded by sup„g[Q \u — A(u)|, since et{ip;9X) = J ijj{9{u))dX~^{u). Hence 

w{et{--9*),et{--9))<dt{9\9). 

Since doo{9*,9) imphes dt{9*,9) ^ if is continuous at t, 9* ^ w{et{; 9*),tt) 
is continuous at 9. Hence by the same arguments in the proof of Theorem 2.1 
of [6], the claim follows. □ 

Corollary 2.9. Let {M.n', n ~ 1, 2, . . .} be a sequence of stationary Monte Carlo 
procedure and M{-; z) be a Ptk from a Polish space D to D^o and it is ergodic 
with no fixed time of discontinuity for each z € D. Assume Zn to be a Pn-tight 
random variable. If 

J Woo(M„(-;a;„),M(-;Z„(a;„)))P„(dx„) ^ 

then {A4n\ n > 1} is consistent to its invariant {H„; n = 1, 2, . . .}. 

Proof. It comes from Theorem 2.8 with a similar argument with Corollary A. 6. 

□ 

2.3 A lemma for time scaling 

Let Ain — {Mn,e) be a sequence of (discrete) Monte Carlo procedure and 
r„ be a positive increasing sequence tending to oo. If 6* = {9{0),9{1), . . .) ~ 
Mn{d9; Xn), then consider two different time scaling: 

9'iit)=9{[r^t]), 9',{t)^9{Nt) 

where [x] is the integer part of x and Nt is an independent Poisson process 
with E[Nt] = rnt. This scalings defines two continuous Monte Carlo procedures 
Mi, = (M;, e) for i = 1,2 where M^{d9] x) is the law of {9^it)]t > 0} given x. It 
is obvious that r„-weak consistency of {Mn, n > 1} is equivalent to consistency 
of {M^-jU > 1}. Since is not stationary even if Mn is, we will consider 
instead of it. The following lemma states that consistency of {M.^]n > 1} 
is sufficient for that of {M\; n>l} (in fact, asymptotically equivalent). 
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Lemma 2.10. //{A^^;n > 1} is consistent to {n,i;n > 1}, then {M'^]n > 1} 
is also consistent to the same Ptks. 

Proof. For 6*^, write Tq = 0,T^ = inf {t > T,^i; ANt ^ 0}. Then 

etit, 01)^— J2 ^(^(*)) + (1 - [rnt]/rnt)^P{ei[rnt]) 



and 

oo 

et(^; ^2) = 1^ E At-T,At). 

where aAb := miii{a, b}. Write ei('0; for e((V'; ^2) replacing Nt by [r„i]. The 
difference between the replacement is bounded by |1 — Nt/[rnt]\ in the bounded 
Lipshitz metric, which tends to in probability. On the other hand, since 

where the conditional expectation is taken for {Tf, i > 0}, by Jensen's inequality, 
we have 

w{eti-, 91), Uni-; Xn)) <wie'ti-, 9^2), T^ni-,Xn)) 
which proves the claim. □ 



3 Data augmentation for mixture model 

Let (X, X) be a measurable space and let Q = [0, 1]. For e G [0, 1], let Fe{dx) = 
fe{x)dx be probability measures on {X,X) having the same support. Consider 
the following simple mixture model: 

p{dx\9) = (1 - 9)Fo{dx) + 9F,{dx). (3.1) 

We write n i.i.d. observation from Fq = p{'\9 = 0) by .t„ = (x^ , . . . , a;") and 
Pn{dxn) ~ nr=i -^('^^')- assume e = e„ ^> and en^^^ — !> 00. Write 
r„ = e~^n'^/'^ and A„ = en^/^. There is an obvious relation r„ — n\~^. There 
are two remarks to note here. First, the following arguments are also true 
for e = 1 and the proof are almost the same. Second, although we assume 
that the true model is Fq, by contiguity, the consistency results also holds for 
^0 = 0(A^^). In this section, we assume the following. 

Assumption 3.1. There exists g : X ^ II such that 

f \r,\^Fo{dx)^o{e^), 
Jx 

where rf_{x) = fe{x)/fo{x) ~ eg{x) — 1. Moreover, 

[ g{x)^Foidx)=I eiO,oo). 
Jx 

The prior distribution is assumed to be po = Beta(ai, ao) for ao, ai > 0. 
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This assumption is stronger than quadratic mean differentiability of Fg at 
6* = 0. However, for example, = A^(e, cr^) and F<: = 7V(0,ct^(1 — e)^) satisfies 
the above conditions. Note that Fo{g) — and Fo{r^) — 0. 

3.1 Data augmentation strategy 

As already discussed in Introduction, data augmentation (DA) 6 DA (6*) is 

(a) Each i = 1,2,..., flip a coin with head proportion 9f^{x^)/ ((1 — 9)fo{x'^) + 

(b) Generate 6 from Beta(ai + ni, ao + n — ni). where ni is a count of heads. 

Run this iteration d{i + !)•<— DA{6{i)) for a certain length from 0{0). Write 
Ain for stationary Monte Carlo procedure corresponding to DA. We consider 
scaling 9 t-^ A„0 and write A^* ~ {M*, e) for the scaled Monte Carlo procedure, 
that is, M*{-; Xn) is the law of {A„(?(i); i > 0} given x„. 

Theorem 3.2. Under Assumption 3.1, {A1„;n > 1} is local rn-weakly consis- 
tent and local Tn-strongly degenerate. 

Proof. Wc first prove local r„-wcak consistency of Mn- If = (^(0), ^(1), ■ • ■) ~ 
M„(-; x), let e^'it) be as in (A.6) and let M^(-; x) be the law of {6''=(t); t > 0} given 
Xn- Then by Lemma 2.10, local r„-weak consistency of A^„ follows if consistency 
of Ai'^ = (M^,e) is proved. However, by Corollary 2.9 and Corollary A.6, the 
result follows, which completes the proof for the first claim. 

The latter claim is an easy corollary of Proposition A. 5. Write Tm for m-the 
jump time of 9'^. Then for any S > and for any mn/r„ — > 0, 

w'(e™„(A„0),ei(A„0)) < sup \9'{t) - ^(0)1 < u;^e(T„,J < w'g.{S) 
te[o,T,„„) 

in probability as n — oo where w'^{S) is a modulus of continuity defined in p. 122 
of [1]. Since the law of 9'^ tends to a diffusion process, by upper semi-continuity 

ofx^ w'^{S), for P{dz) = N{0,I), 

\imsnp R'„^JM*J < limsup / min{l, w'g4S)}M^^{d9''; x)Pn{dx) 
< J J mm{l, w'goiS)}M*id9'';z)P{dz). 

Hence the strong degeneracy follows by taking (5 — > 0. □ 

Remark 3.3. We assume stationarity ofMn- It means that we should generate 
9(0) from the posterior distribution. It is just a technical condition and we can 
replace the initial guess by any good 9(0). See Proposition 7 of [6]. 
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3.2 Efficient MCMC method for simple mixture model 

When DA does not work well, a Metropolis-Hastings (MH) algorithm often used 
as an alternative. In our simple mixture model, it works. For Ptk K(dy; x) and 
F{dx),G{dx), write 

F (g) K{dx, dy) = F{dx)K{dy] x), Fo G{dx, dy) = F{dx)G{dy). 

For measure H on &^ , write t{H){dx, dy) = H{dy,dx). 

3.2.1 Independent type Metropolis-Hastings algorithm in general 

Let F, G be probability measures on Q and assume F{dx) — r{x)G{dx). Let 

r{h) 

where a Ab = min{a, b}. Independent type MH (IMH) algorithm h ^ MH(/i) 
is the following procedure: 

(a) Generate g from G. 

(b) Set h = g with probability a{h,g). Otherwise, do noting. 

Write for the Monte Carlo procedure defined by the above procedure with 
61(0) ~ G (not F). Write K for its Ptk. Then 

K{dy;x) = G{dy)a{x,y)+6Ady){l - A{x)) 

where A(x) — G{dy)a(x, y) is an acceptance probability. This representation 
yields 

F (g, K ^ {F o G) At{F o G) + {F o S){1 ~ A) 

where 6{x,dy) = 5x{dy). In particular, K has F as an invariant probability 
measure. 

Consider a probability space (X„,A:'„,P„). Assume that F^Gn arc Ptk 
from Xn to 6 having ratio rn{xn,yn) = dFn/dGn{xn,yn)- Write Mn for MH 
constructed by i^„(-;x„) and G„(-;a;„) with 9{0) ^ G„{-',x„). Note that Mn is 
not stationary. 

Let I? be a Polish space equipped with Borel cr-algebra and d" : X" —^Dhe 
measurable and let F, G arc Ptk from D to and write r ~ dF/dG. Assume 
that d F{-\d) and d G{-;d) are continuous in total variation distance. 
Write H o d"{dy; .t„) = H{-; d"(x„)) for H = F,G. 

Proposition 3.4. Assume ||(i?„ — H o d"){-; Xn)\\ = op„ (1) for H = F,G and 

£((i"|P„) is tight. If for any compact set K C D, there exists Ck > such that 
Pi'jd) < CKG{-]d) [d G K), then A4n is consistent 



11 



Proof. First we show that A4n can be regard as a stationary Monte Carlo 
procedure. Observe that for any e > 0, taking a compact set K such that 
limsup„^^P„(d" e K'=) < e, 

hnisupP„({2;„;F„(-;a;„) > e + c_ft:G'„(-; a;„)}) < limsupP„(d" e K") < e. 

Therefore by Proposition of [6], it is sufficient to check a local consistency of 
stationary version of A4n, that is, replacing the initial distribution G„ by P„. 
Then by Lemma A. 3 of [5] and Propositions 2,3 of [6] with continuity of F, G, it 
is sufficient to show ergodicity of the limit chain. This is clear by P-irreducibility 
which is shown by d) < ckG{-; d) {d S K). □ 



3.2.2 Application to simple mixture model 

Wc consider a general procedure to construct random sequences from p(d9\xn)., 
which is the posterior distribution for the parametric family Vs — {pidx\6); 9 G 
G} with respect to a prior distribution pQ. Assume we have n i.i.d. copy 
Xn = [x^^ , • ■ • 7 a;") from p(dx\9Q). 

(a) Construct a parametric family Qe = {(l{dx\9)] 9 G 0}, which is similar to 
Vq, and set a prior distribution qe- Note that Qe may depend on n. 

(b) Construct a quasi-posterior distribution q{d9\xn), which is the posterior 
distribution for Qq with the prior distribution q^. 

(c) For each a;„, construct 9 <— Mi{{9) for F = p{d9\xn) and G = q{d9\xn)- 

For the simple mixture model (3.1), there are the following two examples of 
Qe- Recall that A„ = en~^/^ and r„ = e~^7i"^/^. Write Mn for the Monte 
Carlo procedure. Let p*[A\xn) ■= p{^n{9 ~ 9q) e A|x„) and q*[A\xn) '■= 
q{\n{9-9o)^A\xn). 

Example 3.5. Take q{dx\9) — F^g{dx) and take q^ ~ pQ ~ Beta(ai,ao). Then 
for 9o = 0, 

\\q*{-\xn) - p*{-\d = Zn)\\ = op„(l). 

where p*{-\d) and Z„ are defined in (A. 5) and (A.l). In particular, A^„ has 
local consistency for 9q ^ by Proposition 3.4 and Propositions 2,3 of [6] and 
in fact, it is also local consistent for 9i:, G [0,1] under some regularity conditions. 
Recall that DA is not local consistent hut local rn-weakly consistent for 9o = 
(see Theorem 3.2). 

Example 3.6. Write Fa{x^) = J x''Fa{dx). Let X = R'' Assume Fa{x) ^ 
Fi,(x) for a ^ b and Fa{x'^) < oo for a,b ^ [0,1]- For simplicity, assume 
Fo{x^) < oo. Take Q ^ {N{^i,a^),^i G R) where 

a' = 2-\Fo{x - Fo{x))^ + F,{x - F,{x)f). 
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The nearest one of Q from p{dx\0) in Kullback-Leibler distance is 

q{-\e) ■.= N{0F,{x) + {l-e)Fo{x),a'). 

Set Qg = {q{-\0)]d G 0}. The posterior distribution q{dd\xn) with uniform 
prior is the truncated normal distribution with mean fig and variance cFq where 

n-'T:U^'-F,{x) 2_ 



2 ■ 



F,(.t) -Fo(.t) ' ^ n{F,{x) - F^{x)) 
Then for the scaled version q*{A\xn) = q{Xn{0 — 9o) £ ^|a;„), 

\\q*{■\xr,)-N^iXn^lQ,T^)\\ 
tends in Pn -probability to and 

(Z„,A„/.Q)^iV(Q,(^ { )) 

where -/V^(/i, E) is a truncated normal distribution on R"*" and — Fq{x — 
Fo{x))'^ / Fo{xg{x))'^ . In particular, A4„ is also local consistent for Oq G [0,1] by 
Proposition 3.4 and Propositions 2,3 of [6J. 



4 Numerical results 

We compare DA with MH through a numerical simulation. Consider a normal 
mixture model F^{dx) = N{e,l). We denote 0(0), 0(1), . . . for a path of a 
MCMC. First we see paths of 

ni/2(0(z)-0„) 

for i = 0, 1, 2, ... of two MCMC methods for one observation Xn = {x^ , ■ ■ ■ , x") 
where 0„ is the Bayes estimator for L^-loss function. The initial guess is the 
moment estimator. Even for a relatively small sample size {n = 50), the path of 
DA has much weaker mixing than that of MH. For a large sample size {n = 10^), 
unlike MH, DA behaves like a path of a stochastic diffusion process (Figure 1). 

Write si™'' = "^^^ X]fc:0^ ^(*)- Tables 1-4 show the estimated values of the 
standard error of 

A„(0i")-0„) (4.1) 

starting from the moment estimator where m is the iteration number of MCMC 
methods and e = 1 for Tables 1 and 2 and e = rt"^/'' for Tables 3 and 4. MH 
behaves better than DA in terms of a smaller standard error for both cases. 
However, in the former case (e = 1), both Fo and Fe is far enough which makes 
DA relatively good. The former is n^/^-wcakly consistent and the latter is n^^^- 
wcakly consistent. MH is local consistent for both examples. 
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Path of MCMC 




Figure 1: Plot of paths of MCMC methods for n = 10^. The dashed hne is a 
path from DA and the sohd hne is MH. 



Table 1: DA for e = 1 



n 


m = 102 


TO ~ 10^ 


TO ^ 10^ 


TO = lO'^ 


10 


0.3199809 


0.1012226 


0.03217747 


0.009997727 


102 


0.766464 


0.2492395 


0.07961785 


0.0249476 


10^ 


2.075795 


0.7136275 


0.2223515 


0.06771291 


Table 2: MH for e = 1 


n 


m = 102 


771 = 10^ 


TO = 10"* 


m = 10^ 


10 


0.1711105 


0.05355567 


0.01702631 


0.00539654 


102 


0.3341676 


0.1062388 


0.03396667 


0.01066741 


10^ 


0.6469089 


0.2045675 


0.06470591 


0.02023579 



Table 3: DA for e ^ 



n 


TO = 102 


m = 10^ 


TO = 10** 


TO = 10^ 


10 


0.4008295 


0.1290747 


0.04067078 


0.01303128 


102 


1.384782 


0.5331395 


0.1703108 


0.05349205 


10^ 


2.94899 


1.909526 


0.6233598 


0.2017869 




Table 4: MH foi 






n 


TO = 102 


TO 10^ 


TO = 10* 


TO = 10^ 


10 


0.1655218 


0.05205809 


0.01647237 


0.005216286 


102 


0.2995662 


0.09484733 


0.0302533 


0.0095037 


10^ 


0.530985 


0.167514 


0.05249483 


0.01674280 
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A Appendix 

A.l Some properties of simple mixture model 

In this subsection, we address some properties of a simple mixture model. The 
results in this subsection are just simple modifications of well known results. 
First we show uniform local asymptotic normality. Write Pnh{dxn) '■= 

n 

Z„(x„)=n-i/2^g(rE^). (A.l) 

i=l 

Then Zn ^ N{0,I) under P„ = P„,o. 

Lemma A.l. Under Assumption 3.1, for Ln,h dPn,h/dPn and H > 0, 

sup \\og{Ln.h)-hZn + h^I/2\=op^{l). (A.2) 

h£[0,H] 

Proof. Assume h G [0,i7]. Likelihood ratio Lnji is 

n n 

n(i + h\-\f,{x^)if,{x^) 1)) = Wd + hs„{x^)). 

i=l i=l 

by taking Sn{x) ~ n~^/^g{x) + A~^re(x) (recall that A~^e — n~^/^). Set r to be 
log(l + x) — X — x'^/2 + x'^r{x), then r(x) — ?■ (x 0). By this notation, 

n n n 

log(L„,;,) =hY, Sn{x') -h'Y. ^n{x'fl2 + h'Y, S^{x')\{hs^{x')) . 
i—1 i—1 i—1 

Then the first term of the right hand side is h times 

n n 

^ g(^^) + A-1 ^ r,{x^) = Z„ + op„ (1) 

i=l 1=1 

by the assumption for r^. By similar arguments, X)"=i Sn(a;*)^ — > / in proba- 
bility. Since X^ILi ^n{x^) tends to a normal distribution, maxi=i_..._„ |s„(a;*)| = 
op^{l) by lower Berry-Esseen bound [3]. Hence 

n n 

Vs„(x02|^(/is„(xO)| < Vs„(x'')2 max |r(iJs„(x'))| (A.3) 

— ^ — ^ 2=1 n 

4=1 i=l 

tends in P„-probability to by Slutky's lemma. By these convergence (A.2) 
follows. □ 
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By Theorem 2.1 of [8], P„ and Pn,h„ are mutually contiguous for any /i„ — >■ 
h > 0, and P„ is mutually contiguous to 

Qn{dXn) ■■= { Pn,h{dXn)p*Q{dh)){ I pQ{dh))~^ 

Jo Jo 

where Pq is a scaled prior distribution, that is, Pq{[0, h]) ~ pq{[0, X~^h]). The 
support is [0, A„] for the scaled distribution. 

Lemma A. 2. Under Assumption 3.1, for any Mn — >■ oo, there exists a test 
ipn ■ Xn — > [0, 1] and constants Ci and C2 such that 

Pn{^n) 0, P„J,(1 - < CXp(-Cl/l2) (V/l > MnC2). 

Proof. To make g bounded, let h^{x) = niax{mm{g{x), L}, —L} where i > 
is a constant such that J {h^ {x))^ Fo{dx) > 1/2. By central limit theorem, the 
probability of the following event tends to under P„ : 

n 
i=l 

Now we have 

pih'^ie) - Foih"^) = e I h'^{f,{x)/fo{x) - l)Ff,{dx) = eieFoih'^g) + Foih'^r,)) 
J X 

and Fo{h^g) > Fo{{h^)^) > 1/2 and Fo{h^r,) < LFo{\r,\) = op„(e). Therefore, 
there exists C > and G N such that for any n > N, 

p{h^\9) - Fa{h^) > COe 

where e = e„ only depends on n. By definition 

n 

A^, = {xr.-n-^'\Y. h\x') -pih'^m < ni/2(Fo(/i^) - pik'^lO)) + Mr,}. 

i=l 

For h = e\-^ > 2M„/C, n^/^{Fo{h^) -p{h^\e)) + M„ is smaller than -hC/2. 
By this observation, by Hocffding's inequality, P„^/i(yl^) is smaller than 

n 

P„,h({.T„;n-i/2(^ - pih'^m < -hC/2}) < cxp{^C'h^) 

i=l 

where C" > is a constant. Hence the claim follows by taking ipn = 1a„- CH 
The posterior distribution is 

^ _ nr^i((i-g)/o(^')+g/e(x^)>ew 

pmxn) ^^^^^^ _ ^^^^^^^^^ ^ m^^))pe{d^) ■ 

Write p*(-|x„) for scaled posterior distribution, that is, p*([0, h]\x„) = p{[0, X:^^h]\xn)- 
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Lemma A. 3. Under Assumption 3.1, for any M„ — > oo, 

P„(p*([0,M„]^|x„)) ^0. 

Proof. Let ijin be the test defined in Lemma A. 2. Since Pn{ipn) — > and mutual 
contiguity of P„ and Qn, it is sufficient to show 

Qn{{l-^n)p*{[0,Mr,r\x„))^0. 

Re-write Q„(da;„) = /^f i„^/iP„(da;„) p^idh)/ Pe{dh) and 

p*(^dh\x,.) ^ LnMPhidh) ^ Ln.,Phidh) 

Ln,hP*Q{dh) /j, LnjiP^idh) 

We have 

Q„((l " V'n)p*([0,Af„]^|x)) < ( / P,-,,h{l - A.)p*Q{dh))Pe{[Q,H])-\ 

"'[0,j\/„]<= 

The value is bounded above by 

for n > N where A^^iJ < 1/2. Therefore it tends to and the claim follows. □ 

Let p*{dh\d) be a measure on R+ such that 

p*{dh\d) oc exp(M - h^I/2)h"'-^dh (A.5) 

where d G R"^. For any singed measure i/, we write z^"^ for the positive part of 
v. The following is a Bernstein-von Mises type convergence of our model. 

Proposition A. 4. Under Assumption 3.1, 

Pn{\\p*{-\Xn)-p*{-\d = Z,,{Xn))\\) ^ 0. 

Proof. Let 

rnA^n) = log(i„,ft) - hZ,, + h'^I/2 + (ao - 1) log(l - K^h). 

By Lemma A.l, for any (5 > 0, setting An = {^^n; sup;,gjo,_ff] kn(^n)l < '^^ 
obtain P„(A„) ^> 1. By representation (A. 4) with the fact 

Ln,hP*e{dh) = CcxpihZn - + rn,h)h-^''^dh 

for some constant C > 0, we obtain 

l[Q,H][h)p*{dh\x^) < p*{dh\d = Z„) exp(2(5)(l -p*([0, HY\d = Z.^))-^ 

for X G An. Therefore, 

\\p*[-\Xn) ~P*{-\d = Zn)\\ < p*{[0,H]'\Xn)+2{p*{-\Xn) - P* {-{d - Z„))+ ( [0, i/] ) 

for X G A„. Since p*([0, i/J'^jd = — > in P„-probability, taking iJ — > cxd and 
(5 — > 0, by Lemma A. 3, the claim follows. □ 
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A. 2 Convergence to a diffusion process 



Consider a time and state scale change of DA. As in Section 2.3, Nf be a Poisson 
process on a stochastic basis with E[Nf] = r„t. Write 

e-{t) = Xne{[Nt]). (A.6) 

We consider a convergence property of pure step Markov process 6"^ = {^^(t); t > 
0}. More precisely, if we denote M* for the law of 6"^ given a;, we show conver- 
gence of Al* to a law M* defined below. Corresponding Ptk K^{dh*;h,Xn) in 
the sense of Section 4b of Chapter XI of [4] is 

oo 

r„ ^ p{dh*\h, m)p{m\h, a;„) 

m=0 

where 

1. For m = 0, . . . , n, and for 9 = X^^h, 

. |. ^ V- 1 (l-g)/o(x^)(l-yJ) + g/,(x^V 



2. For TO = 0, . . . ,n, p{A\h,m) = C{h* e A\h,m) is defined by C{X-^{h* + 
h)\h, to) = Beta(ai + to, + n — to). 

Let 

z) — ai + hz ~ h^I, c{h, z) = 2h. 

Let {Xt;t > 0} be a stochastic process on a stochastic base (ft, T,F, P^) with 
coefficients {b{-, z),c{-, z), K{-, z) = 0) in a sense of Definition 2.18 of Chapter 
III of [4] and the initial distribution p*{-\d = z) defined in (A. 5). Set M*{-;z) = 
Ci{Xt;t>0}\P,) and C{z) = P := N{0, 1). 

Proposition A. 5. Under Assumption 3.1, the coefficients satisfy 

bn{h,Xn) = J K''idh*;h,Xn)h* ^b{h,Z„{Xn))+OpJl), (A.7) 

Cn{h,Xn) = j K"{dh*;h,x„){h*f =c{h,Z,,{x,,))+op^{l) (A.8) 

and 

dn{h,xn)= J K"{dh*;h,xn)(h*)^ = op,Sl) (A.9) 
locally uniformly in h. 

Proof. We write op^ and Op^ to denote local uniform convergence and tightness 
in h with respectively. Fix H > 0. 
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1. We first prove the convergence of E{m\h,Xn)- 

E{m\h,x„) ~ rr,h = /iZ„ - /i^/ + op„ (1), (A.IO) 
r-^E{{m-E{m\h,x^))^\h,x^) ^ h + opjl). (A.ll) 

By a simple calculation yields 

(1 - e)o{Mx) - Mx)) _ (1 - 9)e{fjfo{x) - 1) 



/ {y-9)p{dy\x,i 



lyem} {l-0)fo{x) + 9Mx) l + 0(/,//o(x)-l) 

By the above, the left hand side of (A. 10) is, for — \~^h, 

iJ(E(«'-<')|/.....)^(l-A,T'/.)/.EHOT 

2—1 i—1 ^ ^ 

where s^{x) ~ \~^{f^{x)/f^{x) — 1). Wc also have 

The leading two terms in the right hand side is Z„ — hi + op^ (1) and the 
third term is op^ (1) since it has the same form as (A. 3) removing term 
and replacing r{hsn[x)) by s„(a;)/(l + hsn{x)). Thus (A. 10) follows. To 
show (A.ll), observe 



/ {y-m{x)fp{dy\x,e) 

J V 



e(i-e)h{x)f,{x) eii-0)fjfo{x) 



(1 - 0)fo{x) + eUx)Y (1 + e{h/h{x) - i)Y 



where m(a;) ~ J yp{dy\x,9). By this observation, the left hand side of 
(A.ll) is, by r^^ 



By Slutky's lemma with the same argmnent with (A. 3), the above is 

n 

'^(l + ^nSn{x')) + Op^ {l)=h + 0P„(1) 



n 

i=l 



which proves (A.ll). 

2. Now wc check the convergences of drift and diffusion cocfhcicnts. Since 
h* — Xni^n^iih* + h) — h)] and A„7'„ = n. 

rnE{h*\h,m) = n{ + x-^h). (A.12) 

ai + ao + n 
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Recall that bn{h,Xn) is the expectation of the above value with respect to 
E{-\h,Xn)- We thus get (A. 7) since 

hn{h,Xn) = ai+ E{m\h, x„) - r„/i + op,_(l) = ai + hZn - h^I + op„(l)- 
By definition, c„(/i, x„) is the expectation of the following with respect to 

E{-\h,Xn)- 

r„E{{h*f\h,m) = rnE{h*\h,mf + rnE[{h*~E{h*\h,m)y\h,m]. (A.13) 

By replacing X^^h ~ n~^r„ft, of (A. 12) by n~^E{m\h,Xn), the expectation 
of the first term of (A.13) is 

r;;^ E[{m - E{m\h,Xn))^\Kx„] + op^{l) = h + op^{l) 

by (A.ll). Replacing h* by h* + h, the second term of (A.13) is 



Tn^i I {x- )^(iBeta(a:;/3i,/3o) = T-nA^ 



(/3i+/3o)2(/3i+/3o + l) 



where /3i = ai+m and /3o = Q!o + — ™ and Beta(x; /3i , /3o) is the density 
of Beta(/3i, /3o)- It is straightforward to check that the integral of the 
above is r~^E{m\h, Xn) + op^ {\) = h + op^ (1), which proves (A. 8). 

3. To prove (A. 9), we prepare the following equation for k — 3,4: 

E{{m~E{m\h,x^)f\h,x,,) = OpJr^J-^). (A.14) 

If A" ~ Bi(l,|3) (binomial distribution with parameter p), \E{X — p)'| = 
\p{l - pY + (-p)*(l - J5)| < 2p. Using this fact, for fc = 3, the left hand 
side of (A.14) is 

I ± Eiiy^ - Eiflh, x^))'\h, .„)| < ± 

which is Op^{rn) by the above arguments. For k ~ 4, 

n 

E[{m - E{m\h, Xr^))''\h, x„] = ^ E[{y' - E{y%, Xn)f\h, Xn] 

1=1 

n 

+ E[{y'~E{y%,Xn)f\h,Xn]E[{y^ -Eiy'\h,Xn)f\h,Xn]. 

By the same arguments, the two terms in the right hand side are Op^ (r„) 
and Op„(7'^j) with respectively. Hence (A.14) follows for fc = 3,4. 
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4. Last we show (A. 9). The left hand side of (A. 9) is r„A^i?(/|/i, x„) where 
/(m) is 



J{x- X~^h)UBcta.{x;PuM = ^ Q y a;MBcta(x; /3i, /3o)(-A-i/i) 

^/4\ r(A+z)r(/3i+/3o) 1 

By simple algebra, using /3i + /3q ~ ai + ao + n, 

(A + U /^i I ^ g( ^^+1 + — — ) 



'r(/3i)r(/3i+/3o + «) ' ' ' n^+i 

for some constant c > and expectation of the right hand side of above has 
order Op„(rVn'+i) since by (A.IO) and (A.14), E{m'\h,Xn) = Op,^^) 
for 1 = 1,2, 3, 4. Replacing a fractional of Gamma functions, the left hand 
side of (A. 9) is 

^»^»E + (2)/3r')""'(-^«'/^r'+op„(A-i). 

Then by a simple algebra, it is 

[^-hY\s,x^) + QT-^eE{{^ 
The first term is 

r-3^((ai +m- hr,,Y\h,Xn) = r;;^E{{m - E{m\h,x,,) + Op^{l)f\h, Xn) 

which is Op^ (^n Using similar arguments, the second term is Op^ (''^ ^) 
by (A. 11) and hence (A. 9) follows and it completes the proof. 

□ 

Corollary A. 6. The Proholov distance U'(M*(-; x„), Af*(-; Z„(x„)) tends in 
Pn -probability to 0. 

Proof. Consider a product of the spaces {Xn, Xn, Pn) and write it {X,X,P). 
By a projection a; = {xi, X2, ■ ■ .) ^ Xn & Xn, we consider any function of Xn as 
that of X. For P„-tightness of the law of Zn, without loss of generality, we can 
assume Zn{x) ^ K {n = 1,2, . . .) for any x G X for a compact set K. For any 
compact set C c R+, 

Rn,c{x) = sup{|6„(/i, a;) - b{h, Zn{x))\ + \cn{h,x) - c{h, Zn{x))\ + dn{h, x)}. 
hec 

tends in P-probability to by Lemma A. 5. Let {C^; i > 1} be an increasing se- 
quence of compact sets such that U^^Ci = R+. Thcn_R„ X^i^i 2^* min{i?„x7., 1} 
tends in P-probability to 0. Hence for any subsequence no of N, there is a fur- 
ther subsequence ni = (nn, ni2, . . .) such that i?„ and Z„ converges to its limits 
almost surely. Then by Theorem 4.21 of Chapter IX of [4], the claim is true for 
the subsequence. Since this is true for any subsequence, the claim follows. □ 
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